Skip to content

leaderboard: per-dataset metric derivation, chip filters + metric toggle on every table#29

Merged
radinhamidi merged 2 commits into
mainfrom
leaderboard/critical-table-revision
May 20, 2026
Merged

leaderboard: per-dataset metric derivation, chip filters + metric toggle on every table#29
radinhamidi merged 2 commits into
mainfrom
leaderboard/critical-table-revision

Conversation

@radinhamidi
Copy link
Copy Markdown
Member

Summary

Comprehensive table-quality pass across every leaderboard page, driven by issues found in the latest review:

  • No more phantom metric columns. /datasets/[id] used to render whatever eval_metrics the dataset registry listed (MAP on TREC DL, recall_1000 on BEIR) regardless of whether those metrics actually appeared in the data. The per-dataset shard now derives its columns from the actual run rows, same approach as the home matrix uses.
  • Chip filters on every per-X page, not just the home page. /datasets/[id] gets Method/Model/Retriever/Metric; /methods/[id] gets Model/Retriever/Metric; /models/[id] gets Method/Retriever/Metric; /retrievers/[id] gets Method/Model/Metric. Behavior matches the home page (chip→qg-chip-hiddenqg-itable-reapply handshake).
  • Metric toggle everywhere. Per-method/model/retriever pages now expose both primary (nDCG@10) and secondary (R@1k or R@100) per dataset column, swapped via the Metric chip.
  • Pretty labels everywhere. Dataset short labels + METRIC_LABEL (ndcg_cut_10nDCG@10, recall_1000R@1k, recall_100R@100, mapMAP) on /datasets/[id] and the per-X pages too.
  • Drop the ugly inner scrollbar. Home + /datasets/[id] no longer set max-h-[70vh] overflow-y-auto. The page scrolls naturally; sticky top-0 thead sticks to the viewport.
  • /models index renders the display label (gpt-4.1) not the provider-prefixed id (openai/gpt-4.1) — matches the /methods index convention.
  • /runs/[run_id] reproduce snippet rebuilt against the real example pipeline. Pyserini index names no longer have the spurious .flat.splade-pp-ed / .flat.bge-base-en-v1.5 for non-lexical paradigms; trec_eval references the qrels key from the dataset registry, not the topics key.
  • /runs/[run_id] Method field shows the display name (Q2D (FS) etc.) not the raw method_id.
  • /about no longer claims every row ships a .run.txt and queries.tsv — those are optional under the current schema; path includes the {retriever} segment that PR Schema: optional artifacts + DL-HARD dataset entry #20 added.
  • Replaces duplicate cell + chip-bar code across 5 pages with two shared components: MatrixCell.astro (link + primary/secondary spans + sort hooks) and FilterChips.astro (groups + metric special-case + reapply event).

Test plan

  • python -m pytest reproducibility/tests/ — 44/44 passing
  • pnpm --filter @qg/leaderboard build — clean (1113 pages built)
  • /datasets/beir-v1.0.0-scifact: single metric column with nDCG@10 + R@100 toggle, no recall_1000 phantom column
  • /datasets/msmarco-v1-passage.trecdl2019: no MAP phantom column
  • /models/ index card titles show display labels (gpt-4.1, Qwen2.5-72B-Instruct…)
  • /runs/* Method field shows Q2D (FS) / Q2D (COT) for query2doc variants
  • /runs/* reproduce snippet generates beir-v1.0.0-trec-covid.splade-pp-ed, not .flat.splade-pp-ed
  • Home page produces no max-h-[70vh] wrapper

🤖 Generated with Claude Code

radinhamidi and others added 2 commits May 20, 2026 03:17
…ric toggle on every table

- per-dataset shard reads metrics from actual runs (no MAP/recall_1000 phantom columns)
- shared FilterChips + MatrixCell components reused across home / dataset / method / model / retriever pages
- every per-X table gets chip filters (method/model/retriever/metric as applicable) + metric toggle
- pretty metric labels (nDCG@10, R@1k, R@100, MAP) everywhere
- drop double scrollbar on home + per-dataset tables
- /models index renders display label, not provider-prefixed id
- /runs page shows method display name; reproduce snippet aligned to example pipeline with correct Pyserini index names and qrels-based trec_eval
- /about page no longer claims run.txt/queries.tsv are guaranteed; path includes retriever segment

Co-Authored-By: Claude Opus 4.7 <[email protected]>
…ed filter card

- Wrap every table in a fixed-height card with a styled 8px thin scrollbar so the page chrome stays in view while rows scroll
- Sticky thead inside the scroll container; sticky leftmost axis columns (Method/Model/Retriever, varies per page) with CSS-var-driven widths and a mobile fallback
- Inline sort arrows on stacked dataset/metric column headers via a slot the table wires into
- Filter chips moved into a dedicated card; metric toggle now also re-fires the current sort so row order matches the visible metric
- MatrixCell always renders both metric spans (em-dash for missing) and uses the new .qg-cell-best highlight (accent + dark-mode glow)
- Decimal precision unified at 4 across MatrixCell, side-by-side dataset cells, and the run-detail metrics table
- /datasets/[id] renders both metrics side by side instead of a single-column toggle
- /datasets/ index drops the stale eval_metrics badge
- /runs/[run_id] reproduce snippet simplifies the qrels lookup
- Stat cards gain hover:border-qg-accent; InteractiveTable search input restyled with magnifier icon; MetricCell removed (dead code)

Co-Authored-By: Claude Opus 4.7 <[email protected]>
@radinhamidi
Copy link
Copy Markdown
Member Author

Pushed a follow-up commit (2f6b74f) addressing the table-mechanics critique:

  • Sticky thead + sticky leftmost axis columns (Method/Model/Retriever per page) with themed thin scrollbar inside a fixed-height table card.
  • Filter chips wrapped in a dedicated card; metric toggle now re-fires the current sort so rows match the visible metric.
  • MatrixCell always renders both metric spans; precision unified at 4 digits everywhere; new .qg-cell-best highlight (accent + dark-mode glow).
  • /datasets/[id] renders both metrics side by side instead of a one-column toggle.
  • /datasets/ index drops the stale metrics: badge; /runs/[run_id] reproduce snippet simplifies the qrels lookup.
  • Sort arrows on stacked dataset/metric headers now sit inline with the dataset name (slot-based, no longer drop to a new line).
  • Stat cards gain hover-border polish; deleted dead MetricCell.astro.

@radinhamidi radinhamidi merged commit e60fa48 into main May 20, 2026
2 checks passed
@radinhamidi radinhamidi deleted the leaderboard/critical-table-revision branch May 20, 2026 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant